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*  Accurate  measurements  of  system  characteris¬ 
tics  from  limited  amounts  of  data  are  important  in  many 
apMifcatlons.  Some  adaptive  control  systems  carry  out 
m^g^alrements  of  plant  parameters  during  normal  opera- 
tiorr^In  communication  systems  an  analogous  situation 
ailreff  in  the  utilization  of  time -varying  channels  (1). 
Often  the  application  of  a  special  test  signal  is  undesir- 
AleTb  the  information  must  be  obtained  from  ordinary 
IpjMis^utput  data.  The  presence  of  random  noise  and 
(gtawnentation  errors  can  render  many  measurement 
ineffective.  Statistical  estimation  theory  pro- 
vWeS^owerful  methods  for  dealing  with  this  type  of 
problem.  Some  of  these  methods  are  applied  belew  to 
the  estimation  of  the  pulse  transfer  function  of  a  linear 
system. 

Po\  control  system  applications  Kalman  (2)  has 
shown  that  characterization  of  a  linear  system  in 
terms  of  theVoefficients  of  its  pulse  transfer  function 
offers  many  ^vantages.  Although  the  com];dete  impulse 
response  or  fraguency  response  also  conveys  the  same 
information  a  g^  estimate  of  one  of  these  functions  is 
not  readily  translated  into  a  good  estimate  of  another.  In 
addition,  the  assumptions  required  to  express  these  dif¬ 
ferent  functions  in  terms  of  a  finite  number  of  parameters 
suitable  for  estimation  are  generally  not  equiv^ent. 
Therefore  the  estimation  procedure  should  be  formulated 
directly  in  terms  of  the  desired  parameters. 


the  fundamental  question  of  what  basic  form  the  estimates 
should  take  to  make  optimum  use  of  the  data  available. 
The  present  paper  considers  this  question  by  assuming 
additive  Gaussian  noise  and  utilizing  the  method  of 
maximum  likelihood  to  derive  estimates  having  certain 
optimal  properties.  An  evaluation  of  the  Cramer -Rao 
lower  bound  provides  an  approximation  to  the  sam^ding 
variances.  Due  to  mathematical  difficulties  a  compete 
solution  is  obtained  only  for  a  suitably  restricted  formu¬ 
lation  of  the  problem  which  does  not  exploit  all  the 
available  Information.  However,  the  solution  is  easily 
modified  to  incorporate  the  remaining  information.  The 
results  provide  considerable  insight  into  the  properties 
of  other  previously  suggested  methods.  For  a  similar 
analysis  of  impulse  response  estimation,  see  (12). 

Details  of  some  of  the  results  below  are  contained  in  (13). 

2.  Results  from  Mathematical  Statistics 

The  expectation  (mean  value)  is  denoted  by  E. 
Consider  a  sequence  of  S  independent  vector  random 
variables  whose  probability  density  is  known  except  fpr  a 
parameter  a.  Let  ct^  denote  any  estimate  of  a.  The  bias 
of  CKg  is  (Ea^-a)  and  the  variance  is 


Var 
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Under  general  regularity  conditions,  the  minimum 
possible  value  of  Var  is  given  by  the  Cramer -Rao  , 
lower  bound  (14).  is  said  to  be  a  consistent  estjmate^ 
if  converges  in  probability  (p  Urn)  to  a  as  S 


Estimates  of  the  coefficients  of  both  conventional 
(Laplace  transform)  and  pulse  transfer  functions  have 
been  considered  by  previous  authors.  Ellington  and 
McCallion  (3)  and  Shinbrot  (4)  applied  non-linear  curve 
fitting  techniques  to  this  problem.  Corbin  (5),  Lendaris 
(6),  and  Zaborszky  and  Berger  (7)  obtained  estimates  by 
solving  sets  of  simultaneous  linear  equations  in  deriva¬ 
tives  and  integrals  of  the  input  and  output.  Kalman  (2) 
described  a  least  squares  fitting  meth^  which  was 
investigated  experimentally  by  Bigelow  and  Ruge  (8).  A 
similar  technique  was  applied  by  Kaya  and  Yamamura  (9). 
Joseph,  Lewis,  and  Tou  (10)  used  a  closely  related 
method  which  avoids  bias  errors  due  to  correlated 
disturbances  at  input  and  output.  Kushner  (11)  examined 
in  detail  the  properties  of  a  computationally  simple 
recursive  scheme. 

Thus  many  different  types  of  estimates  have  been  exam¬ 
ined.  However,  none  of  this  previous  work  has  attacked 


Cramer  states,  **From  a  theoretical  point  of  view, 
the  most  important  general  method  of  estimation  so  far 
known  is  the  method  of  maximum  likelihood.  '*  Under 
general  conditions  maximum  likelihood  estimates, 
denoted  as  a,  are  consistent,  asymptotically  Gaussian 
and  asymptotically  efficient.  This  implies  that  as  S 
becomes  large  a  converges  in  a  certain  sense  to  a 
Gaussian  distribution  with  mean  or  and  variance  given  by 
the  Cramer-Rao  lower  bound.  A  useful  property  of  max¬ 
imum  likelihood  estimates  (15)  is  that  iffi^  f(a)  and  the 
transformation  is  one-to-one  then  $  »  f(a).  Analogous 
properties  apply  in  the  case  of  multi|de  parameters. 

The  following  theorem  of  Slutsky  (14)  is  used  later: 
"If  (q,  %,  . . . ,  Pn  are  random  variables  converging  in 
probability  to  the  constants,  x,  y,  , , . ,  r,  respectively, 
any  rational  function  converges  in 

pr<^3ability  to  the  constant  R(x,  y,  , . . ,  r)  provided  that 
the  latter  is  finite, " 


♦  Operated  with  support  from  the  U.  S.  Army,  Navy  and 
Air  Force. 
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3,  Assumptions 


point.  It  turns  out  to  be  a  generalization  of  the  standard 
least  squares  fit. 


The  situation  analyzed'  is  shown  in  Fig.  1.  The 
following  assumptions  are  initially  made: 

a)  The  Input  r(n)  and  output  c(n)  are  sampled 
quantities  with  semiring  interval  unity. 

b)  r<n)  and  c(n)  are  related  by  a  stable  linear 
constant -coefficient  difference  equation, 

c(n)  +  ^  jc(n- 1)  +. . .  +  ^  j^(n-K) 

-  Q(Qr(n)  -  of^r(n-l)-. . .  -aj^r(n-K)  a  0  (1) 

where  0,  K  is  known  and  the  and  are  to  be 

estimated.  The  pulse  transfer  function  is  then 


Koopmans'  results  will  now  be  applied  to  the 
estimation  of  the  pulse  transfer  function.  To  obtain 
Independent  errors  in  each  coordinate  of  each  observed 
point  the  x(n)  and  y(n)  sequences  are  initially  split  into 
non-overlapping  sets.  Each  observed  point  consists  of 
K+1  consecutive  values  of  x(n)  and  the  corresponding 
values  of  y(n).  Take 

x<^>=x(K),  x(K-l), . . .  x^^=  x(0) 

x<2)=  x(2K  +  l),  x(2K), . . .‘  x^^a  x(K+l) 

x^®)ax<s(K+ll-l-k) 


H(z)« 


with  z  ae® 


etc.  with  a  similar  notation  for  y<n)  and  for  the  other 
sequences.  Here  s  indexes  the  observed  points  and  k 
indexes  their  coordinates.  Let  8=1,2, ...  ,S  so  that 
S  is  the  total  number  of  observed  points  where  we  must 
have 


c)  The  quantities 

x(n)  *  r(n)  +  u(n)  y(n)  =  c(n)  +  v(n) 
are  observed  for  0  ^  n  ^  N. 


2K+1^S:£N/K+1 

This  notation  is  illustrated  in  Fig.  2  for  K  =  2. 


d)  The  obscuring  noise  sequences  u(n)  and  v(n) 
are  each  sequences  of  independent  Gaussian  random 
variables  with  mean  zero  and  known  variances  and  o^ 
respectively.  The  covariance 

p  a  E  u(n)  v(n) 

is  not  necessarily  zero  and  is  known. 

4.  Maximum  Likelihood  Estimates 


The  problem  is  now  cast  into  a  form  for  which 
maximum  likelihood  estimates  can  be  obtained.  Consider 
a  P  a  2K+2  dimensional  Euclidean  space  with  axes  r(n), 
r(n-i),  . . . ,  r(n-K),  c(n)t  c<n-l),. . .  •  c<n-K).  <lf  some 
of  the  coefficients  are  known  to  be  zero  the  dimension¬ 
ality  of  the  space  is  correspondingly  reduced. )  For  each 
value  of  n  the  corresponding  set  of  values  from  the  r(n) 
and  c(n)  sequences  determines  a  point  in  this  space.  By 
virtue  of  (1)  these  points  all  lie  in  a  hypezi^ane  passing 
through  the  origin.  If  no  noise  is  present  any  P-1 
linearly  independent  points  determine  a  hyperplane  whose 
equation  provides  the  P-1  values  of  the  and  .  If 
noise  is  present  and  the  x(n)  and  y(n)  sequences  are 
considered  then  the  observed  points  are  scattered  about 
the  hyperplane  and  a  method  o4  fitting  a  hyperplane  to 
these  points  is  required  to  estimate  the  coefficients. 

Since  the  observed  points  have  added  random 
disturbances  in  all  coordinate  directions  the  standard 
least  squares  method, which  assumes  random  errors 
along  ooe  coordinate  axis  only,  is  not  entirely  appro¬ 
priate.  The  problem  of  fitting  *  hyperplane  when  the 
random  errors  occur  along  more  than  one  coordinate 
axis  has  been  examined  by  many  authors  (16, 17).  The 
most  pertinent  analysis  has  been  made  by  Koopmans  (18), 
who  derived  the  maximum  likelihood  solution  for  the 
case  of  Gaussian  errors,  independent  from  point  to 


The  following  vectors  are  now  defined  where  T 
indicates  the  transpose; 


c<‘'>...cW 


-  *'0  '1  ■  ■  ■  'K  0  ■ 


-  1^0  1  K  0  K  ^ 


Thus  a  £^®^  +  ^^®)  and  from  (1) 

^  a  0  for  s  a  1,2, ...  ,S  (2) 

Therefore  the  points  ^^®)  lie  in  a  hyperplane.  The  coeffi¬ 
cients  of  the  equation  of  this  hyperplane  are  the  elements 
of  2*  Considered  as  a  vector,  ^  passes  through  the 
origin  and  is  perpendicular  to  this  hyperplane. 


For  each  observed  point  X^®)  the  random  compo¬ 
nents  have  covariance  matrix 


Z  a  E  f  (®)^ 


'2  • 
^yl  I 

I 


pi  <  0^  I 

.  I  U  - 


where  _I  is  a  (K+1)  dimensional  identity  matrix.  For  all 
S  points  the  probability  density  of  the  X^®)  is  the  multi¬ 
variate  Gaussian  distribution 
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Maximum  likelihood  estimates  are  those  values  of 
the  unkxiown  parameters  maximiziiqg  the  likelihood 
function  >  This  function  is  obtained  by  substituting  the 
observed  values  of  the  random  variables  into  the  proba¬ 
bility  density  (3).  In  the  present  case  this  maximization 
is  equivalent  to  the  minimization  of 


M  =  I  Z 

8=1 

Note  that 

EA  =  M  +  Z  (5) 

Second,  the  expression  (4)  is  minimized  with  respect  to 
to  provide^.  This  is  done  by  employing  the  extremal 
properties  of  generalized  eigenvectors  (19).  It  is  found 
that^  is  given  by  the  solution  of  P  simultaneous  linear 
equations 

[A  -ejZ]2  »  0  (6) 

where  6^  is  the  smallest  value  of  6  satisfying  the 
determinants!  equation 

|a  -  e  z|  =  0  (7) 


The  solution  is  complicated  by  the  fact  that  the  points 
are  not  explicit  functions  of  but  are  merely  restricted 
by  (2)  to  lie  in  a  hyperplane  with  coefficients^. 


We  now  briefly  sketch  Koopmans'  solution  (18)  for 
the  maximum  likelihood  estimate^.  The  minimization 
of  D  is  carried  out  in  two  steps.  First,  for  any  trial 
hyperplane  with  coefficients  points  which  lie 

in  this  hyperplane  are  substituted  for  the  and  those 
which  minimize  D  are  determined.  It  is  found  that  the 
resulting  value  of  D  becomes 


min  D(j^)  «  ^ 

yt  z  V, 


(4) 


It  can  be  shown  that  6^  is  non-negative.  If  Z  =  O.  is 
the  smallest  eigenvalue  of  A  and^  is  die  corresponcfing 
eigenvector.  Otherwise  this  is  a  generalized  eigenvalue 
problem.  Note  that  Z  need  be  known  only  to  within  a 
constant  multiplier.  If  Z  is  singular  the  derivation  must 
be  modified  but  the  solution  is  still  valid. 

5.  Geometric  Interpretation 

It  is  now  demonstrated  that^  satisfies  a  general¬ 
ized  least  squares  fitting  criterion.  Define  a  generalized 
squared  distance  between  any  two  points  and  as 


where 


A 


,(S)  ,(s)T 


- 1 

Irfy':' 

Z(yl®V  .. 

(Eyfx?'  •• 

y  (»)  (s) 

^*0  '1  •  • 

■  ■  ■ 

y 

^0  K 

y  x<») 

^  K  '0 

•m 

.  Z4’V_ 

and  all  sums  run  over  s  =  1,2, ... ,  S.  The  elements  of 
A  are  seen  to  be  sums  of  cross-products  of  the  x(n)  and 
y(n)  sequences.  Also  used  later  is  the  rested  matrix 
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d  =  E 

®  i.j=l 


n  i  '  ij  '  j  j  ' 


(8) 


where  the  Z  .  are  the  elements  of  Z  ,  Consider  an 
observed  pomt  some  trial  hyperplane,  and  the 
'‘adjusted  point"  w'®'  which  lies  in  this  hyperplane  in 
such  a  position  that  d|  is  a  minimum.  For  example,  if 
Zf  ^  is  the  unit  matrix,  dg  is  the  length  of  the  perpendic¬ 
ular  from  to  the  hyperplane  and  w'®'  is  the  point 
lying  at  the  foot  of  this  perpendicular.  For  a  set  of 
observed  points  the  sum  D  of  the  d^  depends  upon  the 
hyperplane.  The  generalized  least  squares  criterion 
selects  the  hyperplane  minimizing  D. 


The  standard  least  squares  fit  along  the  y^  axis 
corresponds  to  the  matrix 


i=  I,  j=  I 

otherwise 


The  sum  of  squared  deviations  is  measured  alopg  the 
(is  1,  j  s  1)  axis  only.  Deviations  along  any  other  axis 
are  weighted  by  Z^^^  *  »  and  are  therefore  forced  to  be 
zero. 

If  the  maximum  likelihood  estimates  are  to  be 
reliable  the  observed  points  must  not  satisfy,  even 
approximately,  more  than  one  relation  of  the  type 
expressed  by  (2).  In  other  words  the  observed  points 
must  not  be  concentrated  in  any  linear  subspace  of 
dimension  less  than  P  -  1  or  the  hyperplane  of  best  fit 
will  not  be  well  defined.  This  requires  linear  independ¬ 
ence  among  the  r^^'  for  each  value  of  s  and  therefore  it 
is  necessary  that  the  r(n)  sequence  not  be  the  solution  of 
any  linear  constant -coefficient  difference  equation  of 
order  K+1  or  less.  Therefore  exponential  or  low-order 
polynomial  inputs  are  undesirable  for  estimation 
puposes. 

6.  Properties  of  the  Maximum  Likelihood  Estimates 

(1)  Consistency.  Maximum  likelihood  estimates 
are,  in  general,  consistent  so  that 

p  lim  £  ■  1 

(2)  Bias.  For  finite  S,  £  is  generally  biased. 
However,  Koopmans  has  shown  that  if 


Zfi  «  for  all  i 


(9) 


so  that  the  noise  variance  is  small  compared  with  the 
mean- square  values  of  r(n)  and  c(n)  then  the  bias  is 
negligible  compared  with  the  standard  deviation  of  £• 


(3)  Variance.  Under  the  condition  (9)  but  without 
usii^g  the  assumption  of  Gaussian  noise  Koopmans  has 
obtained,  by  an  involved  matrix  series  representation,  an 
approximation  to  the  covariance  matrix  of  the  £j, 

Jcovyj,rjJ  «  I  (10) 

Here  is  the  matrix  formed  by  deleting  the  first  row 
and  firsrcolumn  of  M.  The  values  for  i,j»  1  do  not 
appear  in  this  covariamce  matrix  since  =  1  by  assump¬ 
tion.  The  matrix  Mi|  is  proportional  to  the  covariance 
matrix  of  the  estimates  that  would  be  obtained  if  the 
errors  occurred  along  just  one  coordinate  axis  so  that 
the  standard  least  squares  estimates  were  appropriate. 

The  scale  factor  g-  (£^  Z  depends  upon  the  true 
parameter  values  and  the  noise  covariance  matrix  and  is 
inversely  proportional  to  the  number  of  observations.  We 
have  established  by  a  rather  intricate  computation  which 
will  not  be  repeated  here  the  basic  result  that  for  Gaussian 
noise  (10)  is  the  same  as  the  covariance  matrix  given  by 
the  Cramer -Rao  lower  bound  for  joint  unbiased  estimates. 

The  quantity  in  (7)  is  the  sum  of  the  squared 
deviations  from  the  hyperplane  of  best  fit.  If  (9)  holds 
then  it  can  be  shown  that  B  0^  «  (S-I)/S  and  the  order  of 
magnitude  of  the  standard  deviation  of  is  VT/S  ,  Thus 
Indicates  how  well  the  data  fits  the  estimated  coeffi¬ 
cients.  An  excessively  large  value  may  suggest  that  the 
order  of  the  system  which  has  been  assumed  is  not  large 
enough.  Alternatively,  if  the  scale  factor  of  Z  is  unknown 
it  can  be  estimated  by  0^. 

7.  Estimates  with  Overlappipg  Sets  of  Values 

The  estimates  £  are  maximum  likelihood  only  with 
respect  to  the  observed  points  constructed  from  the  non- 
overlappli^  sets  of  values  of  the  x(n)  and  y(n)  defined  in 
Section  4.  Since  these  points  do  not  contain  all  the 
information  in  the  data  it  appears  that  Improved  results 
would  be  obtained  by  taking  as  observed  points  every 
successive  set  of  (K+1)  values  of  the  x(n)  and  the 
corresponding  y(n)  which  would  increase  the  number  of 
points  S  by  a  factor  of  (K+1).  The  noise  components  are 
then  no  longer  independent  from  point  to  point  and  althongh 
the  maximum  likelihood  equations  are  easy  to  derive  it 
has  not  been  found  possible  to  solve  them  in  a  useful 
form.  If  the  matrix  A  is  calculated  from  overlapping  sets 
of  values  and  e  mployed  with  (6)  and  (7)  It  can  be  shown 
that  no  additional  Mas  errors  are  introduced  and  it 
appears  that  the  variance  is  reduced  by  a  factor  of  almost 
1/(K+1).  It  is  conjectured  that  when  the  noise  compo¬ 
nents  are  laxge  compared  with  r(n)  and  c(n)  this  pro¬ 
cedure  is  efficient  but  that  when  they  are  small  a  better 
method  may  exist.  For  this  procedure,  which  seems 
moat  useful  for  practical  purposes,  A  becomes 
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Z  y^(n) 

Yj  y(n)  y(n-l) 

.  .  1  2  y(n)  x(n)  . 

1 

.  Yj  y(**)  5c(n-K) 

II 

Zy(n-l)  y(n) 

•  •  1  Zy(“"^)*(°)  • 

1  : 

Yj  yin-1)  xin-K) 

Yj  x(n-K)  y(n) 

^x0i-K)y(n-l)  . 

.  .  1  ^x(n-K)x(n)  .  . 

Z  x^(n-K) 

where  all  summations  run  over  n=K,  K+1,  . , , ,  N.  The 
elements  of  this  matrix  are  measured  auto*  and  cross- 
correlation  functions  of  the  x(n)  and  y(n)  except  that  dif¬ 
ferent  summations  include  slightly  different  sets  of  values 
of  the  products. 

8.  Properties  of  the  Estimates  for  Other  Types  of  Noise 

If  the  noise  obeys  the  assumptions  of  Section  3 
except  that  it  is  non-Gaussian  then  the  estimates  (6)  and 
(7)  are  no  longer  maximum  likelihood.  However*  the 
geometrical  interpretation  and  the  fact  that  the  variance 
is  primarily  influenced  by  only  the  covariance  matrix  of 
the  noise  suggests  that  these  estimates  are  still  reason¬ 
ably  good.  It  can  be  shown  that  under  general  conditions 
these  estimates  remain  consistent. 

If  u(n)  and  v(n)  are  sequences  of  correlated  ran¬ 
dom  variables  then  the  noise  components  of  the  are 
not  independent  and  again  the  maximum  likelihood  esti¬ 
mates  are  not  known.  If  they  are  stationary  time  series 
and  the  covariance  matrix  Z  is  used  with  (6)  and  (7)  then 
consistent  estimates  are  still  obtained. 

9.  Discussion  of  Other  Estimates 


It  is  of  interest  to  compare  the  properties  of  the 
simpler  standard  least  squares  estimates  described 
by  Kalman  (2).  These  estimates  minimize  the  sum  of 
squared  distances  measured  along  a  single  coordinate 
axis  and  are  given  by  the  solution  of  a  set  of  simultane¬ 
ous  linear  equations.  With  the  distance  measured  aloi^ 
the  yp  axis  and  «  1*  they  satisfy 


Ax*  -  0 

so  the  i'th  component  of  is 


(U) 


(12) 


where  the  Jujj  are  the  cofactors  of^. 

It  can  be  shown  that  the  yariances  for  these  esti¬ 
mates  are  approximately  the  same  as  for  £.  Unfortunate¬ 
ly  they  are  not  consistent  when  noise  is  present.  To 
demonstrate  this  suppose  S  is  large  and  that  r<n)  has 
reasonable  characteristics  so  that  M  converges  to  some 
constant  matrix.  Then  under  general  conditions 

p  Urn  A  ■  M  +  Z  (13) 


By  Slutsky's  theorem  (Section  2) 


p  lim  y* 


p  lim  Zyi 
p  lim 


(14) 


so  that  knowing  M  and  Z  these  values  can  be  calculated. 
The  asymptotic  bias  introduced  by  the  non-zero  elements 
of  Z  can  be  evaluated  by  noting  that  from  (2) 

M  Z  =  0  (15) 

Whether  this  bias  is  significant  depends  upon  the  magni¬ 
tude  of  the  noise  and  the  desired  accuracy.  An  example 
is  given  in  the  next  Section. 

It  is  apparent  that  the  solution  (6)  subtracts  out 
from  the  matrix  A  the  best  estimate  0  Z  of  the  compo¬ 
nents  due  to  noise.  A  simpler  estimate  which  is  not 
asymptotically  biased  is  given  by  the  solution  of 

[  A  -  Z]  y  =  0  (16) 

but  this  is  presumably  not  so  efficient  as  the  maximum 
likelihood  estimates. 


If  no  noise  is  present  in  the  x(n)  sequence  then  it 
has  been  shown  (13)  that  a  set  of  simultaneous  linear 
equations  can  be  formed  which  provides  consistent  esti¬ 
mates  of  y  without  further  knowledge  of  Z.  With  noise 
present  in  both  the  x(n)  and  y(n)  sequences  the  method  of 
Joseph*  Lewis  and  Tou  (10)  provides  estimates  without 
requiring  a  knowledge  of  Z.  They  form  a  set  of  simulta¬ 
neous  linear  equations  in  terms  of  the  cross -correlation 
functions  of  x(n)  and  y(n)  with  a  signal  elsewhere  in  the 
system  related  to  x(n)  and  having  uncorrelated  noise 
components.  If  such  a  signal  is  available  the  method 
appears  quite  useful  although  any  optimum  properties 
remain  to  be  established.  Since  an  unfavorable  input 
signal  could  cause  the  equations  to  become  singular  or 
poorly  conditioned  and  therefore  produce  estimates  with 
large  variances  the  necessary  restrictions  on  the  input 
should  be  investigated. 


10.  Example 

The  calculations  of  properties  of  y  and  y*  are 
now  demonstrated  by  a  simple  example.  Consider  the 
pulse  transfer  function 


m 


«0 


(17) 
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where  \^A  <  1,  and  are  to  be  estimated »  and  u(n) 
and  v(n)  obey  assumption  a)  of  Section  3  with  p  »  0.  De¬ 
note  auto-  and  cross -correlation  functions  of  the  actual 
r(n)  and  c(n)  sequences  by 


<p  (m) 
^rr 


1 

N  +  1 


N 

Yj  r(n)  r(n+m) 
n=0 


(18) 


From  {10), 

Warp 

Var  Q!„ 


1 


(T^(l+/3^)  +  ff^  al 

V  1  u  0 

(7^  (1  +  ^?)  +  <7^ 

V  1^  U  0 

S  <l>  (0) 

rr 


(28) 


(29) 


N 


^  N4T  ^  c(n)c(n+m) 


n=0 


(19) 


The  asymptotic  values  of  the  standard  least  squares 
estimates  along  the  yp  axis  are  found  from  (14)  to  be 


1 

E  r(n)c<n+m) 
n=0 


(20) 


p  lim  p 


[Cov  Y •  ]  can  be  obtained  by  using  the  approximation, 
valid  for  large  S, 


,(1) 

cc 

CC 

rc 

(1) 

^  (0) 

cc 

cc 

rc 

«rc(0) 

4>  (“1) 

^rc'  ' 

<P  (0) 

T'rr' 

M  « 


The  elements  of  M  can  be  calculated  from 
0rc<"’)  “  E  ‘’<P) 

p=0 

and 


1 


1  1  +  (0) 

V  cc 


=  %  l-fCT^/i^  (0) 


(30) 


(31) 


(21) 


(22) 


E  *i(P>Mq)  <^_(ni-p+q)  (23) 

p=0  q=0 

where  h(p)  is  the  impulse  response  given  by  the  inverse 
z -transform  of  H(z). 


The  biases  are  seen  to  depend  on  the  ratio  of  the  noise 
variance  to  the  mean-square  input  or  output. 

11.  Conclusions 

The  contribution  of  the  present  paper  lies  in 
applying  the  method  of  maximum  likelihood  to  the  problem 
at  hand  by  utilization  of  Koopmans*  general  solution  to  the 
hyperplane  “fitting  problem.  Some  of  the  properties  of  the 
estimates  which  have  been  discussed  are  based  on  Koop- 
mans*  work  and  others  are  original  results. 

These  estimates  are  valid  for-  arbitrary  inputs  and 
automatically  take  into  account  the  initial  conditions  (stored 
energy)  of  the  system.  The  method  can  easily  be  extended 
to  include  an  unknown  additive  constant  (d.c.  level)  in  x(n) 
and  y(n).  A  continuous  system  can  be  handled  by  approx¬ 
imating  it  as  a  sampled -data  system.  However,  the 
optimum  choice  of  die  sampling  interval  remains  to  be 
investigated. 


The  simidest  case  is  when  r(n)  is  a  white-noise¬ 
like  sequence  such  that  for  large  N 


<p  (0)^0 
^rr 


^  (m)»  0 
^rr'  ' 


m  0 


(24) 


Then  it  is  found  that 


cc  CC 


(25) 

(26) 


There  follows 


M  « 


JO) 


k27) 


Maximum  likelihood  estimates  of  the  poles  and 
zeros  of  the  system  can  be  obtained  from  the  maximum 
likelihood  estimates  of  the  coefficients  by  virtue  of  the 
transformation  property  mentioned  in  Section  2.  The  same 
applies  to  parameters  of  a  controller  which  are  functions 
of  the  coefficients. 

Some  sampling  experiments  have  been  carried  out 
on  a  desk  calculator  and  have  generally  supported  the 
theoretical  analysis.  In  applications  a  digital  computer 
could  solve  the  equations  (6)  and  (7)  routinely.  Experience 
indicates  that  estimates  of  this  nature  which  are  not  sensi¬ 
tive  to  errors  in  the  observed  data  nevertheless  require 
accurate  solutions  of  the  resulting  equations.  The  intro¬ 
duction  of  approximations  such  as  (16)  will  often  deterio¬ 
rate  the  estimates  considerably,  especially  for  small  S. 
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